Discriminate the Falsely Predicted Protein–Coding Genes in Aeropyrum Pernix K1 Genome Based on Graphical Representation
نویسندگان
چکیده
Abstract The problem that how many protein-coding genes exist in Aeropyrum pernix K1 genome has confused many scientists since 1999. In this paper, we attempt to re-identify the protein-coding genes in this genome by proposing a modified method based on I-TN curve. Consequently, all of the 727 experimentally validated protein-coding genes and 726 of the corresponding negative samples are correctly predicted respectively, then an accuracy of 99.93% of self-test is obtained. In the Jackknife test, two positive samples and two negative samples are falsely predicted, respectively, and then the accuracy of cross-validation is 99.72%. In the testing set, all of the 132 putative genes are correctly predicted as protein-coding and 14 out of the 841 hypothetical genes are predicted as non-coding, the number of protein-coding genes is reduced to 1686 instead of 1700. Further analysis shows the performance of the reannotating algorithm is comparable to other prevalent programs, and the present method is much simple and efficient. We implement the reannotating algorithm trained by Aeropyrum pernix K1 to Chlorobium tepidum TLS genome, and 217 hypothetical genes are predicted as non-coding. Sufficient sequences analysis indicates most of them are random sequences that are falsely predicted as protein-coding genes. In addition, we also perform some significative analysis aiming to the influence of artificial parameters on the graphical representation approaches, which may provide helpful information for related researches.
منابع مشابه
Gene recognition based on nucleotide distribution of ORFs in a hyper-thermophilic crenarchaeon, Aeropyrum pernix K1.
The 2694 ORFs originally annotated as potential genes in the genome of Aeropyrum pernix can be categorized into three clusters (A, B, C), according to their nucleotide composition at three codon positions. Coding potential was found to be responsible for the phenomenon of three clusters in a 9-dimensional space derived from the nucleotide composition of ORFs: ORFs assigned to cluster A are codi...
متن کاملCharacterization of a whole set of tRNA molecules in an aerobic hyper-thermophilic Crenarchaeon, Aeropyrum pernix K1.
The tRNA molecule has an important role in translation, the function of which is to carry amino acids to the ribosomes. It is known that tRNA is transcribed from tRNA genes, some of which, in Eukarya and Archaea, contain introns. A computational analysis of the complete genome of Aeropyrum pernix K1 predicted the presence of 14 intron-containing tRNA genes. To elucidate whether these introns ar...
متن کاملComplete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1.
The complete sequence of the genome of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1, which optimally grows at 95 degrees C, has been determined by the whole genome shotgun method with some modifications. The entire length of the genome was 1,669,695 bp. The authenticity of the entire sequence was supported by restriction analysis of long PCR products, which were directly ampl...
متن کاملEVOLUTION AND tRNA RECOGNITION OF THREONYL-tRNA SYNTHETASE FROM AN EXTREME THERMOPHILIC ARCHAEON, Aeropyrum pernix K1
An extreme thermophilic archaeon, Aeropyrum pernix K1 possesses two possible threonyl-tRNA synthetase genes. Sequence homology analysis of these genes with other species threonyl-tRNA synthetase showed that the shorter gene did not possess motif-2 and motif-3 of catalytic core that were conserved in class II aminoacyl-tRNA synthetases. On the other hand, the longer gene had almost all amino aci...
متن کاملMolecular recognition of proline tRNA by prolyl-tRNA synthetase from hyperthermophilic archaeon, Aeropyrum pernix K1.
To investigate the recognition mechanism of tRNA(Pro) by prolyl-tRNA synthetase from hyperthermophilic archaeon, Aeropyrum pernix K1, various tRNA(Pro) transcripts were prepared by in vitro transcription system. These transcripts were aminoacylated with proline by overexpressed A. pernix prolyl-tRNA synthetase. From prolylation experiments, recognition elements of A. pernix tRNA(Pro) were deter...
متن کامل